Automated Generation of High-Performance Large-Scale Matrix Multiplication Accelerator on FPGA
نویسندگان
چکیده
Matrix multiplication (MM) is a key linear algebra routine which has been widely used in many application areas. In this work we provide a high-performance single-precision dense MM FPGA accelerator, and also an automatic generator to generate the accelerator with high throughput and high resource efficiency based on hardware and MM workload specifications. The accelerator adopts the linear systolic array as the basic building block and contains an optimized architecture which integrates several blocks together. The size and the number of blocks are parameterized, allowing the user to search for the optimal design parameters using an automatic design space exploration. The accelerator is tested on the Xilinx VC709 evaluation board, and shows a peak performance of 198.1 GFLOPs.
منابع مشابه
A High Performance FPGA-Based Accelerator for BLAS Library Implementation
This paper describes the implementation and the performance analysis of a hardware accelerator for the BLAS library matrix multiplication operation. This accelerator is based on a dual-FPGA board and on an implementation BLAS software library making use of the FPGA-based hardware. In order to evaluate the performance of such a system, we implemented the matrix multiplication operation (BLAS “dg...
متن کاملFPGA accelerator for floating-point matrix multiplication
This study treats architecture and implementation of a FPGA accelerator for double-precision floating-point matrix multiplication. The architecture is oriented towards minimising resource utilisation and maximising clock frequency. It employs the block matrix multiplication algorithm which returns the result blocks to the host processor as soon as they are computed. This avoids output buffering...
متن کاملRandom access schemes for efficient FPGA SpMV acceleration
Utilizing hardware resources efficiently is vital to building the future generation of high-performance computing systems. The sparse matrix – dense vector multiplication (SpMV) kernel, which is notorious for its poor efficiency on conventional processors, is a key component in many scientific computing applications and increasing SpMV efficiency can contribute significantly to improving overal...
متن کاملFPGA based dataflow accelerator for large matrix multiplication
Real-world numerical applications often require a huge number of calculations to be done in short time. The best way to speed-up these applications is to exploit a huge amount of data parallelism by parallelizing independent calculations. Multi-core processors do not have enough resources to achieve any significant utilization of available data parallelism. Instead of adding new CPUs, addition ...
متن کاملEnergy-Efficient Design of Kernel Applications for FPGAs Through Domain-Specific Modeling
Because of their high performance and flexibility, FPGAs are an attractive option for use in embedded systems, where both high performance and low energy consumption are important. Therefore, it is important to create FPGA designs that are not only high performance but also low energy. The flexibility of FPGAs facilitates their high performance, but also makes it difficult to design for them. T...
متن کامل